Unsupervised Morphology Induction Using Word Embeddings
نویسندگان
چکیده
We present a language agnostic, unsupervised method for inducing morphological transformations between words. The method relies on certain regularities manifest in highdimensional vector spaces. We show that this method is capable of discovering a wide range of morphological rules, which in turn are used to build morphological analyzers. We evaluate this method across six different languages and nine datasets, and show significant improvements across all languages.
منابع مشابه
Unsupervised POS Induction with Word Embeddings
Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored. In this paper, we show that embeddings can likewise add value to the problem of unsupervised POS induction. In two representative models of POS induction, we replace multinomial distributions over the vocabulary ...
متن کاملWord sense induction using word embeddings and community detection in complex networks
Word Sense Induction (WSI) is the ability to automatically induce word senses from corpora. The WSI task was first proposed to overcome the limitations of manually annotated corpus that are required in word sense disambiguation systems. Even though several works have been proposed to induce word senses, existing systems are still very limited in the sense that they make use of structured, domai...
متن کاملUnsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings
Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monoling...
متن کاملKPCA Embeddings: An Unsupervised Approach to Learn Vector Representations of Finite Domain Sequences
Most of the well-known word embeddings from the last few years rely on a predefined vocabulary so that out-of-vocabulary words are usually skipped when they need to be processed. This may cause a significant quality drop in document representations that are built upon them. Additionally, most of these models do not incorporate information about the morphology of the words within the word vector...
متن کاملUnsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation
The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. ...
متن کامل